Professional wrestling, while not everyone’s cup of tea, is big business. What started as a carnival act has turned into a global entertainment industry. Netflix recently started showing Monday Night Raw, a program from the biggest North American wrestling company, WWE – this deal is reportedly worth $5 billion. Like any large entity, WWE is not without competition, drama, and scandal.
General Tips
This is very much a step-by-step process. Don’t go crazy trying to get everything done with as few lines as possible. Read the documentation for the AlphaVantage api! Carefully explore the pages from cagematch. There isn’t a need to get too fancy with anything here – just go with simple function and all should be good. Don’t print comments, but use normal text for explanations.
Step 1
In the calls folder, you’ll find 4 text files – these are transcripts from quarterly earnings calls. Read those files in (glob.glob will be very helpful here), with appropriate column names for ticker, quarter, and year columns; this should be done within a single function. Perform any data cleaning that you find necessary.
Company Participants ticker quarter year
0 James Marsh - IR EDR q3 2023
1 Ari Emanuel - CEO EDR q3 2023
2 Jason Lublin - CFO EDR q3 2023
3 Mark Shapiro - President and COO EDR q3 2023
4 Conference Call Participants EDR q3 2023
.. ... ... ... ...
133 So in reverse order, ad share deal with Twitch... WWE q2 2023
134 Seth Zaslow WWE q2 2023
135 Well, thank you, everyone, for joining us on t... WWE q2 2023
136 Operator WWE q2 2023
137 This concludes today's call. Thank you again f... WWE q2 2023
[701 rows x 4 columns]
Step 2
Use the AlphaVantage api to get daily stock prices for WWE and related tickers for the last 5 years – pay attention to your data. You cannot use any AlphaVantage packages (i.e., you can only use requests to grab the data). Tell me about the general trend that you are seeing. I don’t care which viz package you use, but plotly is solid and plotnine is good for ggplot2 users.
The general trend is a positive one. The drop occured when WWE got boight by TKO, which it bounced back directly from. In the last year there seems to be exponential growth. This is after Paul Leveque (Triple H) took over the company totally from Vince McMahon.
Step 3
Just like every other nerdy hobby, professional wrestling draws dedicated fans. Wrestling fans often go to cagematch.net to leave reviews for matches, shows, and wrestlers. The following link contains the top 100 matches on cagematch: https://www.cagematch.net/?id=111&view=statistics
What is the correlation between WON ratings and cagematch ratings?
** Which wrestler has the most matches in the top 100?
*** Which promotion has the most matches in the top 100?
**** What is each promotion’s average WON rating?
***** Select any single match and get the comments and ratings for that match into a data frame.
Promotion \
0 New Japan Pro Wrestling
1 Pro Wrestling NOAH
2 New Japan Pro Wrestling
3 New Japan Pro Wrestling
4 New Japan Pro Wrestling
.. ...
95 All Elite Wrestling
96 All Japan Pro Wrestling
97 Ring Of Honor
98 All Elite Wrestling
99 New Japan Pro Wrestling
Match WON Rating Rating
0 Kazuchika Okada vs. Kenny Omega 6.0 9.81
1 Kenta Kobashi vs. Mitsuharu Misawa 5.0 9.80
2 Katsuyori Shibata vs. Kazuchika Okada 5.0 9.78
3 Kenny Omega vs. Will Ospreay 6.25 9.76
4 Kazuchika Okada vs. Kenny Omega 7.0 9.76
.. ... ... ...
95 Cash Wheeler & Dax Harwood vs. Jay White & Jui... 5.25 9.47
96 Kenta Kobashi vs. Steve Williams 4.75 9.47
97 CIMA, Masato Yoshino & Naruki Doi vs. Dragon K... 5.0 9.46
98 Konosuke Takeshita vs. Will Ospreay 5.75 9.46
99 Hiroshi Tanahashi vs. Kenny Omega 5.75 9.46
[100 rows x 4 columns]
q1: What is the correlation between WON ratings and cagematch ratings?
import plotly.express as pximport plotly.io as piofiltered_df = df_matches.dropna(subset=['WON Rating'])correlation = filtered_df['WON Rating'].corr(filtered_df['Rating'])print(f"Correlation between WON ratings and cagematch ratings: {correlation}")fig1 = px.scatter(filtered_df, x='WON Rating', y='Rating', title='Scatter Plot of WON Ratings vs. Cagematch Ratings')fig1.update_layout(xaxis_title='WON Rating', yaxis_title='Cagematch Rating')pio.show(fig1)
Correlation between WON ratings and cagematch ratings: 0.3142055145382091
Answer: The correlation between WON ratings and cagematch ratings is 0.31, which indicates a slightly positive correlation between the two ratings.
q2: Which wrestler has the most matches in the top 100?
name_counts = {}for match in df_matches['Match']: names = re.findall(r'[A-Z][a-z]* [A-Z][a-z]*', match)for name in names:if name in name_counts: name_counts[name] +=1else: name_counts[name] =1name_counts_df = pd.DataFrame(list(name_counts.items()), columns=['Name', 'Count'])name_counts_df = name_counts_df.sort_values(by='Count', ascending=False)print(name_counts_df)
Promotion
New Japan Pro Wrestling 35
World Wrestling Entertainment 14
All Japan Pro Wrestling 12
All Elite Wrestling 12
Ring Of Honor 8
Pro Wrestling NOAH 6
All Japan Women's Pro-Wrestling 4
World Wonder Ring Stardom 2
Total Nonstop Action Wrestling 1
DDT Pro Wrestling 1
GAEA Japan 1
Lucha Underground 1
Japanese Women Pro-Wrestling Project 1
World Championship Wrestling 1
JTO 1
Name: count, dtype: int64
Answer: New Japan Pro Wrestling has the most matches in the top 100 with 35 matches.
Promotion
All Elite Wrestling 5.5625
World Wonder Ring Stardom 5.5
New Japan Pro Wrestling 5.392857
Japanese Women Pro-Wrestling Project 5.0
Total Nonstop Action Wrestling 5.0
World Championship Wrestling 5.0
All Japan Pro Wrestling 4.979167
Ring Of Honor 4.928571
All Japan Women's Pro-Wrestling 4.916667
World Wrestling Entertainment 4.892857
Pro Wrestling NOAH 4.791667
JTO 4.75
DDT Pro Wrestling NaN
GAEA Japan NaN
Lucha Underground NaN
Name: WON Rating, dtype: object
q5: Select any single match and get the comments and ratings for that match into a data frame.
User \
0 RealTeflonDon
1 cactus
2 TH0810
3 WilsonDrove
4 archerinfection
.. ...
344 hatebreeder
345 GamePrince
346 The Denniz
347 The Rated R Superstar EDGE
348 KASH
Comment Date Rating
0 Time stopped for me when this match took plac... 31.01.2025 10.0
1 An absolute gem and a kick up the arse to WWE... 31.01.2025 10.0
2 There are very few matches that get every asp... 30.01.2025 10.0
3 For almost 11 years this was the last WWE mai... 21.01.2025 10.0
4 One of the greatest matches of all time, one ... 16.01.2025 10.0
.. ... ... ...
344 Freakin' Match of the Year! Das Match war an ... 18.07.2011 10.0
345 Eins der besten Singles-Matches von John Cena... 18.07.2011 9.0
346 Unglaubliche Crowd. Gänsehaut put. Match of t... 18.07.2011 10.0
347 Sehr sehr gutes Match und eindeutig Match of ... 18.07.2011 10.0
348 ****1/2 - Match of the Year 2011. Ich ging mi... 18.07.2011 10.0
[349 rows x 4 columns]
Step 4
You can’t have matches without wrestlers. The following link contains the top 100 wrestlers, according to cagematch: https://www.cagematch.net/?id=2&view=statistics
*** Of the top 100, who has wrestled the most matches?
***** Of the top 100, which wrestler has the best win/loss?
link ='https://www.cagematch.net/?id=2&view=statistics'hot100_req = requests.get(link)hot100_soup = BeautifulSoup(hot100_req.content, 'html.parser')all_links = hot100_soup.select('.TCol a')filtered_links = [link for link in all_links if link['href'].count('&') ==2]df_wrestlers = []for i in filtered_links: wrestler_name = i.text.strip() wrestler_href = i['href'] ID = re.search(r'nr=(\d+)', wrestler_href).group(1) df_wrestlers.append({"Name": wrestler_name,"ID": ID, })df_wrestlers = pd.DataFrame(df_wrestlers)match_stats = []for i in df_wrestlers['ID']: link =f'https://www.cagematch.net/?id=2&nr={i}&page=22' hot100_req = requests.get(link) hot100_soup = BeautifulSoup(hot100_req.content, 'html.parser') wrestler_stats = hot100_soup.select('.InformationBoxContents') matches = wrestler_stats[0].text wins = wrestler_stats[1].text losses = wrestler_stats[2].text draws = wrestler_stats[3].text match_stats.append({"Matches" : matches,"Wins" : wins, "Losses" : losses, "Draws" : draws})match_stats = pd.DataFrame(match_stats)extract_number =lambda x: int(re.search(r'\d+', x).group()) match_stats['Matches'] = match_stats['Matches'].apply(extract_number)match_stats['Wins'] = match_stats['Wins'].apply(extract_number)match_stats['Losses'] = match_stats['Losses'].apply(extract_number)match_stats['Draws'] = match_stats['Draws'].apply(extract_number)df_wrestlers_stats = pd.concat([df_wrestlers, match_stats], axis=1)
q1: Of the top 100, who has wrestled the most matches?
Name ID Matches Wins Losses Draws Win/Loss
25 Gene Okerlund 1383 4 4 0 0 inf
14 Lou Thesz 930 4340 3204 339 797 9.451327
84 Antonio Inoki 1096 3688 2929 459 300 6.381264
62 Bruno Sammartino 243 2047 1546 276 225 5.601449
29 Karl Gotch 946 553 370 94 89 3.936170
.. ... ... ... ... ... ... ...
87 Sami Zayn 1523 1772 795 947 30 0.839493
23 Bobby Heenan 770 1033 265 717 51 0.369596
17 Paul Heyman 664 65 6 55 4 0.109091
65 Cesar Duran 9189 1 0 1 0 0.000000
22 Hiroyuki Unno 7522 0 0 0 0 NaN
[100 rows x 7 columns]
Answer: The wrestler with the best win/loss is technically Gene Okerlund with a ratio of infinity since he has never lost a macth. However, he onlt fought 4 matcches. The best win/loss ratio other than him is Lou Thesz, who has wrestled in 4340 matches, with a win/loss ratio of 9.45.
Step 5
With all of this work out of the way, we can start getting down to strategy.
First, what talent should WWE pursue? Advise carefully.
Answer: WWE should pursue Kenny Omega (from AEW), who has the most matches in the top 100 and has a strong following among wrestling fans. He is english speaking and is extremely consistent in both the ring and on the mic. The only issue is that he is one of the best so it could be a bit expensive or hard to land him. Maybe with the move to Netflix they can reach an agreement. It is also worth oting that TNA and WWE have a deal so there is no need to pursue TNA talent as they can interchange between the 2 companies.
Second, reconcile what you found in steps 3 and 4 with Netflix’s relationship with WWE. Use the data from the following page to help make your case: https://wrestlenomics.com/tv-ratings/
Answer: The viewership skyrocketed after the premier on Netflix, especially on Raw. it has went down sine but still averages a milliion more than before. It is still a new relationship but it has been a very good partnership so far. It has also helped WWE expand globally. The good thing is that CM Punk and John Cernna are active with the company currently who have one of the top 100 matches. Also the storylines in WWE help push it further than the other promotions. Lastly, after the royal rumble many stars returned to WWE. One of them is Charlotte Flair who is the daughter of one of the top 100 wrestlers who has the most matches, Ric Flair. The starpower of the name also brings eyes in, especially as she also has an illustrious career
Third, do you have any further recommendations for WWE?
Answer: WWE should continue their focus on storylines as it creates a returning customer base. They should also look into how to bolster WWE Smackdown as since Raws deal with Netflix, there has been a decrease in viewership for Smackdown. They are already trying by mixing the talent all over the promotions within the company, but they can try and make more compelling storylines based on Smackdown (which I believe they are working towards, especially with Jacob Fatu slowly becoming a singles star).